malicious instruction
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- (4 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Government (0.93)
- Asia > Middle East > Republic of Türkiye (0.04)
- Asia > China > Heilongjiang Province > Harbin (0.04)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)
- Banking & Finance (0.67)
- Law (0.67)
MoGU: A Framework for Enhancing Safety of LLMs While Preserving Their Usability
Large Language Models (LLMs) are increasingly deployed in various applications. As their usage grows, concerns regarding their safety are rising, especially in maintaining harmless responses when faced with malicious instructions. Many defense strategies have been developed to enhance the safety of LLMs. However, our research finds that existing defense strategies lead LLMs to predominantly adopt a rejection-oriented stance, thereby diminishing the usability of their responses to benign instructions. To solve this problem, we introduce the MoGU framework, designed to enhance LLMs' safety while preserving their usability. Our MoGU framework transforms the base LLM into two variants: the usable LLM and the safe LLM, and further employs dynamic routing to balance their contribution. When encountering malicious instructions, the router will assign a higher weight to the safe LLM to ensure that responses are harmless.
The Shawshank Redemption of Embodied AI: Understanding and Benchmarking Indirect Environmental Jailbreaks
Li, Chunyang, Kang, Zifeng, Zhang, Junwei, Ma, Zhuo, Cheng, Anda, Li, Xinghua, Ma, Jianfeng
The adoption of Vision-Language Models (VLMs) in embodied AI agents, while being effective, brings safety concerns such as jailbreaking. Prior work have explored the possibility of directly jailbreaking the embodied agents through elaborated multi-modal prompts. However, no prior work has studied or even reported indirect jailbreaks in embodied AI, where a black-box attacker induces a jailbreak without issuing direct prompts to the embodied agent. In this paper, we propose, for the first time, indirect environmental jailbreak (IEJ), a novel attack to jailbreak embodied AI via indirect prompt injected into the environment, such as malicious instructions written on a wall. Our key insight is that embodied AI does not ''think twice'' about the instructions provided by the environment -- a blind trust that attackers can exploit to jailbreak the embodied agent. We further design and implement open-source prototypes of two fully-automated frameworks: SHAWSHANK, the first automatic attack generation framework for the proposed attack IEJ; and SHAWSHANK-FORGE, the first automatic benchmark generation framework for IEJ. Then, using SHAWSHANK-FORGE, we automatically construct SHAWSHANK-BENCH, the first benchmark for indirectly jailbreaking embodied agents. Together, our two frameworks and one benchmark answer the questions of what content can be used for malicious IEJ instructions, where they should be placed, and how IEJ can be systematically evaluated. Evaluation results show that SHAWSHANK outperforms eleven existing methods across 3,957 task-scene combinations and compromises all six tested VLMs. Furthermore, current defenses only partially mitigate our attack, and we have responsibly disclosed our findings to all affected VLM vendors.
Amazon Explains How Its AWS Outage Took Down the Web
Plus: The Jaguar Land Rover hack sets an expensive new record, OpenAI's new Atlas browser raises security fears, Starlink cuts off scam compounds, and more. The cloud giant Amazon Web Services experienced DNS resolution issues on Monday leading to cascading outages that took down wide swaths of the web . Monday's meltdown illustrated the world's fundamental reliance on so-called hyperscalers like AWS and the challenges for major cloud providers and their customers alike when things go awry . See below for more about how the outage occurred. US Justice Department indictments in a mob-fueled gambling scam reverberated through the NBA on Thursday.
- Asia > Middle East > Palestine (0.14)
- Asia > Myanmar (0.06)
- North America > United States > California > San Francisco County > San Francisco (0.05)
- (9 more...)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Law (1.00)
- Information Technology > Services (1.00)
- (2 more...)
- Information Technology > Communications > Web (0.91)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.53)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.52)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.38)
In-Browser LLM-Guided Fuzzing for Real-Time Prompt Injection Testing in Agentic AI Browsers
AI-powered browser assistants (also known as autonomous browsing agents or agentic AI browsers) are emerging tools that use LLMs to help users navigate and interact with web content. For example, an AI agent can be instructed to summarize a webpage or perform actions like clicking links and filling forms on behalf of the user. While these agents promise enhanced productivity, they also introduce new security risks. One major risk is prompt injection, where an attacker embeds malicious instructions into web content that the agent will process [5]. Crucially, such instructions can be hidden from the human user (e.g., invisible text, HTML comments) yet still parsed by the LLM, causing it to alter its behavior in unintended ways [10]. In effect, the agent can be tricked into executing the attacker's commands rather than the user's, leading to potentially severe consequences [2]. Indirect prompt injections have been demonstrated in real-world scenarios.
- Information Technology > Security & Privacy (1.00)
- Energy (0.67)
- North America > United States > California > Orange County > Mission Viejo (0.04)
- Asia > China (0.04)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- (2 more...)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- (4 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Government (0.93)
- Asia > Middle East > Republic of Türkiye (0.04)
- Asia > China > Heilongjiang Province > Harbin (0.04)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)
- Banking & Finance (0.67)
- Law (0.67)
WAInjectBench: Benchmarking Prompt Injection Detections for Web Agents
Liu, Yinuo, Xu, Ruohan, Wang, Xilong, Jia, Yuqi, Gong, Neil Zhenqiang
Multiple prompt injection attacks have been proposed against web agents. At the same time, various methods have been developed to detect general prompt injection attacks, but none have been systematically evaluated for web agents. In this work, we bridge this gap by presenting the first comprehensive benchmark study on detecting prompt injection attacks targeting web agents. We begin by introducing a fine-grained categorization of such attacks based on the threat model. We then construct datasets containing both malicious and benign samples: malicious text segments generated by different attacks, benign text segments from four categories, malicious images produced by attacks, and benign images from two categories. Next, we systematize both text-based and image-based detection methods. Finally, we evaluate their performance across multiple scenarios. Our key findings show that while some detectors can identify attacks that rely on explicit textual instructions or visible image perturbations with moderate to high accuracy, they largely fail against attacks that omit explicit instructions or employ imperceptible perturbations. Our datasets and code are released at: https://github.com/Norrrrrrr-lyn/WAInjectBench.